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AN  ANALYSIS  OF  THE  MAX-M1N 
TEXTURE  MEASURE 


INTRODUCTION 

In  lieu  of  a  satisfactory  mathematical  description  of  image  scene  content, 
the  U.S.  Army  Engineer  Topographic  Laboratories  (LTL).  Computer  Sciences 
Laboratory  (CSL)  has  embarked  on  a  heuristic  approach  to  information  ex¬ 
traction  from  digital  and  digitized  images,  in  response  to  a  Defense  Mapping 
Agency  (DMA)  requirement  for  improving  current  methods  of  determining 
Digital  Landmass  System  (DLMS)  data.  This  research  note  describes  the  first 
of  many  experiments  in  which  texture  is  used  to  specify  a  signature  that,  in  turn, 
is  used  to  discriminate  among  a  variety  of  candidates  for  classification. 


The  scene  recognition  problem  under  study  pertains  only  to  mapping, 
charting,  and  geodotic  data,  of  which  the  DLMS  is  a  subset.  The  initial  investi¬ 
gations  are  concerned  primarily  with  passive  sensor  records  rather  than  with 
active  ones.  Consideration  is  given  to  limitations  imposed  on  any  process  by 
( 1 )  an  inadequate  mathematical  model  that  causes  false  starts  and  ad  hoc  solu¬ 
tions;  (2)  by  the  computer,  which  allows  a  local-only  view  of  the  scene  rather 
than  a  global  one;  and  (3)  by  the  sensor  record  itself,  which  is  a  two-dimensional 
distorted  version  of  a  three-dimensional  world  wherein  detail  is  obscured  and, 
in  many  cases,  not  even  resolved . 


The  CSL  approach  toward  scene  recognition  is  an  attempt  to  combine 
the  heretofore  separate  processes  of  elevation  determination  and  scene  classifi- 
ation  into  a  cooperative  venture  that  uses  one  process  to  strengthen  the  other 
and  enables  manual  intervention  through  computer  display  devices.  The  approach 
has  evolved  over  the  last  several  years  as  a  result  of  in-house  work  and  contractor 
studies  at  LTL.  An  overview  of  the  approach  was  first  expressed  in  a  paper 
presented  to  the  Congress  of  the  International  Federation  of  Surveyors.1  and 
later  in  a  paper  presented  to  the  Naval  Postgraduate  School.2 


^Michael  A.  Crombic  and  Lawrence  A.  Gambino,  "Digital  Stereo  Photogrammetry,”  Prepared  for  Congress 
of  the  International  f  ederation  of  Surveyors  (FIG).  Commission  V,  Stockholm.  Sweden,  June  1977. 

2Lawrencc  A.  Gambino  and  Michael  A.  Crombic,  "Manipulation  and  Display  of  Digital  Topographic  Data.” 
Prepared  for  the  Second  Symposium  on  Automation  Technology  In  F.nginccring  Drawing.  Monterrey,  Calif.. 
November  1979. 
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One  specific  texture  measure,  namely  the  Max-Min  measure,  is  evaluated 
in  this  research  note.  A  modification  of  the  conventional  Maximum  Likelihood 
Classifier,  wherein  multivariate  normal  populations  are  assumed,  is  used  to  classify 
the  texture  signatures.  The  modified  version  produces  the  R  most  likely  candi¬ 
dates  and  R  probabilities  associated  with  the  estimates  rather  than  just  the  most 
likely  candidate.  The  R  probabilities  are  introduced  to  a  postprocessing  relaxa¬ 
tion  scheme  in  an  attempt  to  reduce  misclassifieations  and  false  alarms. 


MATHEMATICAL  DESCRIPTION  OF  THE  EXPERIMENT 

Everyone  seems  to  know  what  texture  is  as  long  as  a  precise  definition  is 
not  required.  The  hesitancy  in  producing  a  precise  definition  increases  whenever 
the  intuitive  notions  of  texture  must  be  modified  to  describe  image  texture. 
A  variety  of  definitions  and  texture  descriptors  are  available  to  choose  from.3 
In  the  work  at  CSL,  image  texture  has  been  regarded  as  a  point  and/or  line 
pattern  that  is  somewhat  repetitive.  The  minute  pattern  of  detail  can  be  described 
by  measures  of  spatial  image  structure  that  convey  the  idea  of  varying  degrees 
of  coarseness.  If  texture  is  regarded  as  the  relative  frequency  of  local  extremes 
in  pixel  intensity,  then  one  measure  is  the  Max-Min  texture  measure,  which  is 
the  number  of  local  gray  level  maximums  and  minimums  along  a  one-dimensional 
scan.4 


Max-Min  Texture  Measure  •  Consider  a  one-dimension  array  ofquanti- 
/.ied  density  values  (gray  shades).  Density  is  used  rather  than  transmission, 
because  a  uniform  multiplicative  change  in  light  intensity  will  not  affect  a  density 
gray  shade  change  over  a  uniform  texture,  l  or  example,  if  one  part  of  a  field 
is  shaded  and  another  fully  illuminated  by  the  sun,  then  Max-Min  texture  mea¬ 
sures  extracted  from  the  two  should  be  equal.  Unlike  LANDSAT.  which  pro¬ 
vides  four  spectral  gray  shades  for  each  pixel,  texture  measures  associated  with 
a  pixel  from  a  panchromatic  image  must  be  estimated  from  data  at  and  around 
the  pixel.  The  method  of  determining  the  Max-Min  texture  measure  is  described 
next. 


3Robert  M.  Ilaralick,  “Statistical  and  Structural  Approaches  to  TcMurc,”  Proceeding  of  the  /A/7-.,  Vol. 
67.  No.  5.  May  1979. 

4Owen  R.  Mitchell,  Charles  R.  Myers,  and  William  Boyne,  "A  Max-Min  Measure  for  linape  l  ecture  Ana 
lysis.  Ih.hh'  Trans.  Comput..  Vol.  C-25,  April  1977. 
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A  rectangular  window  of  gray  shades  centered  over  the  point  in  question 
is  specified  along  with  a  fixed  set  of  threshold  values.  Four  types  of  scan  lines 
are  considered  over  the  window,  namely  the  set  of  horizontal  lines,  the  set  of 
vertical  lines,  the  set  of  left  diagonal  lines,  and  the  set  of  right  diagonal  lines. 
Fssentially,  a  line  of  gray  shades  from  any  one  of  the  four  sets  is  selected  along 
with  one  of  the  NT  threshold  values  T.  A  smoothing  process,  which  is  a  func¬ 
tion  of  the  specific  T,  is  applied  to  the  data  in  order  to  remove  small  amplitude 
changes  so  that  only  the  significant  extremes  are  retained  and  counted.  The 
extremes  that  exceed  T  are  counted  and  summed  over  the  specific  set  of  lines. 
Fach  of  the  four  sets  of  lines  will  produce  NT  extreme  counts.  The  next  step 
in  the  process  of  generating  Nlax-Min  texture  vectors  is  to  ratio  the  number  of 
extremes  at  one  value  of  T  to  the  number  of  extremes  at  the  next  larger  value, 
thus  producing  four  (NT-1)  component  vectors  where  each  component  is  a 
nonnegative  number  less  then,  or  equal  to.  one.  The  reason  for  this  step  is  to 
produce  texture  data  that  is  invariant  to  the  absolute  number  of  extremes.  There 
should  then  be  a  window  size  of  dimension  NW  that  produces  consistent  texture 
data  in  wlvieh  an  increase  in  dimension  does  not  improve  consistency.  This  is  an 
inq  ortant  consideration,  since  small  windows  produce  better  output  resolution 
and  lequ're  less  computing  time  than  larger  windows. 


Maximum  Likelihood  Classification  Rule  •  The  maximum  likelihood 
rule  as  described  in  appendix  B  was  used  in  this  experiment  for  the  following 
reasons.  The  Baysian  classifier  (see  appendix  A)  from  which  it  was  derived  has 
a  logical  and  intuitive  appeal.  The  specification  of  the  multivariate  normal  popu¬ 
lation  in  the  Baysian  rule  rested  on  familiarity  with  the  process  as  well  as  a  certain 
comfort  derived  from  the  Central  Limit  Theorem.  Finally,  the  procedure  is  well 
documented,  and  it  lends  itself  well  to  the  modifications  employed  in  this  experi¬ 
ment. 


The  Max-Min  algorithm  described  above  produces  four  texture  vectors,  one 
for  each  linear  direction  within  a  window.  There  are  various  ways  the  data  could 
be  processed.  For  example,  one  direction  could  be  chosen  and  used  throughout. 
This  would  reduce  the  compute  time  considerably.  Also,  the  four  texture  vectors 
could  be  averaged  and  the  average  vector  characterized.  This  procedure  would 
not  be  as  effective  in  the  reduction  of  computing  time,  but  areal  texture  would 
be  measured.  Finally,  the  four  texture  vectors  could  be  processed  separately  and 
the  final  characterization  derived  from  the  four  results.  This  procedure  is  the  most 
time  consuming,  but  the  algorithm  is  simple  enough  so  that  if  it  were  proven 
worthwhile,  a  hardwired  device  could  be  developed.  The  last  approach  was 
selected  and  is  described  next. 
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Consider  any  one  of  the  four  texture  vectors,  instead  of  choosing  the  most 
likely  class  as  described  in  appendix  B,  choose  the  R  most  likely  classes.  This 
was  done  in  the  following  manner: 

Produce  the  N  decision  values  for  each  of  the  four  directions. 
Normalize  the  vectors  of  decision  values,  that  is  let 

T  =  D'ls(X).  D;s(X) . Dns(X) 

S  =1,4  directions 

N:  Possible  number  of  classes 

The  D-S(X)  are  normalized  versions  of  the  decision  functions  described  in 
appendix  B. 

N 

Z  cyx) 

i  =  1 

Let  f  =  jp  F  and  choose  the  R  largest  of  the  N  components  of  T. 
s  =  1 

Normalize  these  data  and  produce  the  following  output. 

[(<V  P(C„)).  (cl2.  P(Cl2)),  ,..(ciR,  P(C iR ))] 

Cjr:  the  rth  most  likely  of  the 

i  =  l.N  possibilities  for  X. 

The  weights  P(Cjf)  will  be  regarded  as  probabilities  in  the  relaxation  scheme 
described  next. 


Relaxation  Labeling  •  The  relaxation  scheme  used  in  this  experiment 
is  an  application  of  a  nonlinear  probabilistic  model  to  relaxation  labeling.' 
Consider  a  set  of  results  from  the  classification  process  described.  The  output 
for  each  of  the  processed  pixels  is  an  R -tuple  in  which  each  component  is  made 
up  of  a  label  C|f  and  an  associated  weight  I’tt' ir ) .  where  i  =  1  .N  possible 
classifications  (labels!  and  r  =  1  ,R  most  likely  labels  for  the  specific  point. 
Since  each  of  the  weights  is  greater  than  zero  and  since  1J(C  )  =  1.  the 

weights  can  be  regarded  as  probabilites. 


The  vector  of  probabilities  is  processed  in  parallel  by  reviewing  each  com¬ 
ponent  with  respect  to  its  neighbors.  Modifications  to  the  probabilities  are  made 
to  reflect  neighboring  pixel  information  and  to  reflect  user-imposed  weights 
and  constraints.  The  constraints  are  realized  by  a  user-defined  relational  matrix 
that  describes  the  compatibility  of  neighboring  classes.  The  probability  update 
is  expressed  in  the  following  formula: 


pkm(C  ,  = 

1  ir 


''X,1  ["qj|(v] 

iR 

L  i^y  L'^vJ 

£  =  il 


Ci 


iR 


£  w,  E  ^-<y>>k<s 


g= l  7=1 


£  =  il.iR.  number  of  labels  (classes) 

k  =  l.K:  iterations  of  the  process 
1  =1,1’:  pixels  to  process 

g  =  1,G:  neighbors  used  in  the  update 

:  user  assigned  weights.  Note  that  y^Wt.  =  1. 

a(C£,(\y):  NxN  compatibility  matrix, 

■’Azriel  Rosenfcld,  Robert  A.  Hummel  and  Steven  V> .  /.ticker.  "Scene  Labeling  by  Relaxation  Operation,” 
IEEE  Transactions  on  Systems.  Man,  and  Cybernetics,  Vol.  SMC-6.  No.  6,  June  1976. 
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The  determination  of  the  weights  (Wg)  and  the  compatibility  values  in 
a(C  £.C\y)  are  left  up  to  the  user.  These  inputs  allow  the  user  some  control  of  the 
iterative  update.  For  example,  if  cjj  =  0  (weight  of  pixel  under  modification), 
then  the  neighbors  completely  determine  the  classification.  The  compatibility 
factors  are  numerical  values  between  minus  one  and  plus  one.  A  value  of  plus 
one  for  a  (C^,  C^)  implies  that  classes  and  can  occur  together  with 
no  problem;  whereas,  a  value  of  minus  one  implies  that  the  exigence  of  ( V 
at  location  1  denies  the  existence  of  Cy  at  neighboring  pixels.  A  alue  of  zero 
implies  that  the  existence  of  label  at  location  1  has  no  bearing  on  the 
existence  of  Cy  at  neighboring  pixels. 


The  intention  here  is  to  use  the  relaxation  labeling  concept  as  a  means  to 
pror  ide  the  computer  with  a  larger  view  of  the  scene  by  relating  derived  data  in 
a  rational  manner.  There  are  various  ways  that  the  algorithm  can  be  modified. 
It  is  expected  that  modification  will  come  about  as  practical  experience  is  ob¬ 
tained.  There  are  many  aspects  of  the  algorithm  as  defined  above  that  need  to 
he  tested.  For  example,  using  (1  +  cj)a  with  a  >  1  will  speed  up  convergence; 
however,  the  effect  of  distant  points  will  not  be  felt  if  the  number  of  iterations 
is  reduced.  It  should  be  noted  that  the  compatibility  matrix  need  not,  and  pro¬ 
bably  should  not.  be  symmetric.  It  should  also  be  noted  that  if  the  initial  pro¬ 
bability  of  a  class  is  zero,  then  the  algorithms  as  employed  in  this  experiment  will 
never  revise  the  probability,  even  if  every  neighbor  of  the  point  insists  on  that 
classification. 


NUMERICAL  EXPERIMENTS 

Text  Regions  •  Ten  digital  subscenes  (10242  x  8  bits)  were  extracted 
from  the  Digital  Image  Analysis  Laboratory  (DIAL)  library  for  the  texture 
analysis.  The  parameters  of  the  taking  geometry  and  of  the  scan  procedure  are 
described  in  a  previous  research  note.6  The  set  of  10  subscenes  is  composed 
of  a  near  infrared  exposure  (IR)  and  a  corresponding  panchromatic  exposure 
(PANC)  for  each  of  5  scenes.  The  5  scene  pairs  are  shown  in  figures  1  through 
5. 


^Michael  A.  Crombie.  An  Evaluation  of  Conventional  Correlation  Methods  When  Matching  Infrared  Imagery 
to  Panchromatic  Imagery,  U.S.  Army  lngincer  Topographic  Laboratories.  I  ort  Belvoir,  VA  ITL-0195 
August  1979,  AD-A076  111. 
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FIGURE  4.  Scene  E  From  Exposure 
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The  portions  of  the  subscenes  selected  for  the  texture  analysis  were  de¬ 
signated  by  rectangles.  The  scene  content  in  each  rectangular  window  was  deter¬ 
mined  by  looking  at  the  1R  scene  on  the  DIAL  display.  Results  of  the  visual 
designations  are  presented  below.  A  regular  pattern  of  points  (every  10  lines 
and  every  10  pixels)  over  each  rectangular  area  was  used  to  develop  the  texture 
signatures. 


SCENE  A  SCENE  B 


CLASS 

TYPE 

CLASS 

TYPE 

1 

Building  and  Road 

1 

Heavy  Forest 

2 

Gray  Field 

2 

Scrub 

3 

Rough  Field 

3 

Field,  Building, 
and  Road 

4 

Heavy  Forest 

4 

Dark  Field 

5 

Light  Field 

5 

Light  Field 

6 

Light  Forest 

6 

Light  Forest 

SCENE  C 

SCENE  E 

CLASS 

TYPE 

CLASS 

TYPE 

1 

Heavy  Forest  (Light) 

1 

Dark  Field 

2 

Heavy  Forest  (Dark) 

2 

Light  Field 

3 

Light  Forest 

3 

Heavy  Forest 

4 

Light  Field 

4 

Scrub 

5 

Gray  Field 

5 

Building  and 
Road 

6 


Light  Forest 


SCENE  H 


CLASS  TYPE 

1  Dark  Field 

2  Light  Field 

3  Scrub 

4  Building  and  Road 

Three  window  sizes  were  used  to  develop  the  texture  signatures. 

WINDOW  GROUND  FOOTPRINT  (FEET) 


9x9 

26  x  26 

15  x  15 

46  x  46 

21  x  21 

66  x  66 

The  15  threshold  values  used  to  develop  the  Max-Min  features  were: 
1.  3,  5,  7,  9,  12,  15.  18,  21,  24,  27,  30,  33,  36  and  39. 

The  Max-Min  texture  feature  vectors  (14  components)  were  calculated 
and  stored  on  disc  for  a  variety  of  analyses. 


Numerical  Results  •  Four  kinds  of  numerical  results  were  derived 
from  the  texture  features;  namely  component  compression,  divergence  measures, 
confusion  matrices  before  relaxation,  and  confusion  matrices  after  relaxation. 
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Component  Compression  •  A  principal  component  analysis  of  the 
data  was  performed  to  determine  whether  a  significant  amount  of  information 
contained  in  the  14-component  Max-Min  texture  feature  vectors  could  be  ex¬ 
plained  by  fewer  components.  Results  from  the  648  separate  component  anal¬ 
yses  are  summarized  in  table  1 . 


TABLE  I.  Principal  Component  Results  by  Scene. 


Sampling  Window  Size 


SCENE 

3 

9 

PRINCIPLE 

5  7 

15 

COMPONENTS 

3  5  7 

3 

21 

5 

7 

— 

— 

— — 

— — 

— — - 

1 1 

— 

" 

A 

PANC 

72.5 

83.9 

91.2 

75.8 

88.0 

94.2 

77.6 

89.8 

95.3 

B 

1R 

73.1 

83.8 

91.2 

75.2 

87.2 

93.8 

75.9 

88.0 

94.1 

PANC 

73.9 

84.2 

91.2 

75.0 

87.0 

93.4 

75.7 

87.4 

93.4 

C 

1R 

75.5 

85.2 

91.7 

74.0 

85.7 

92.2 

74.4 

85.5 

92.0 

PANC 

68.9 

81.8 

90.1 

64.8 

79.7 

88.1 

63.3 

78.8 

87.7 

i: 

IR 

70.3 

82.3 

90.3 

66.7 

81.3 

89.6 

67.9 

82.1 

89.7 

11 

PANC 

73.8 

84.6 

91.6 

74.7 

86.7 

93.3 

71.4 

85.5 

92.8 

H 

IR 

73.3 

84.4 

91.5 

67.4 

80.9 

89.1 

65.4 

80.2 

88.7 

PANC 

70.7 

82.2 

89.7 

75.1 

87.4 

93.8 

78.2 

90.6 

95.5 

IR 

71.4 

82.8 

90.4 

74.7 

86.1 

92.9 

77.1 

89.2 

94.8 

The  tabular  entries  are  percentages  of  variation  explained  by  the  first  3,  5, 
and  7  principal  components.  The  results  are  organized  by  scenes,  by  the  two 
kinds  of  exposures,  and  by  the  sampling  window  size.  The  results  from  table 
1  were  averaged  over  scene  and  presented  in  table  2.  The  results  of  table  2 
show  little  variation  over  window  size  or  exposure  type. 


TABLE  2.  Principal  Component  Results. 


_9 

Ji 

1\ 

_3 

j 

J 

J 

_5 

_7 

J 

_5 

2 

PANC 

72 

83 

91 

73 

86 

93 

73 

86 

93 

IR 

73 

84 

91 

72 

84 

92 

72 

85 

92 
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Divergence  Measure  •  The  amount  of  information,  I(C, :  Cj),  con¬ 
tained  in  the  N -dimensional  vector  X  for  discriminating  in  favor  of  class  1 
over  class  J  described  in  appendix  C.  Similarly,  the  amount  of  information 
contained  in  X  for  discriminating  in  favor  of  class  J  over  class  1  is  l(C,:C,), 
and  by  definition,  JfCj.Cj)  =  I(Cj :  Cj )  +  KCj.C,)  is  a  measure  of  the  dif¬ 
ficulty  of  discriminating  between  class  1  and  class  J.  J(C, ,C,)  is  a  nonnegative 
number  and  is  called  divergence. 


Large  values  of  J(C,,  Cj)  are  indicative  of  strong  discriminatory  power, 
whereas  small  values  of  JfCp  Cj )  indicate  poor  discriminatory  power.  The 
purpose  of  the  divergence  analysis  was  to  determine  if  a  worthwhile  relation 
between  classification  error  and  the  associated  divergence  could  be  developed  so 
that  a  variety  of  texture  parameters  could  be  evaluated  without  performing  the 
computing-intensive,  maximum-likelihood  evaluation  and  subsequent  review  of 
the  resultant  confusion  matrices. 


Two  measures  of  association  were  calculated  to  determine  whether  a 
consistent  relation  between  divergence  and  classification  errors  exists.  The  first 
was  the  linear  correlation  coefficient  R[}  E ,  which  is  a  measure  of  the  linear 
dependence  between  divergence  and  classification  error.  The  second  was  Spear¬ 
man’s  rank  correlation  coefficient  Pu ,  which  is  a  measure  of  the  degree  of 
correlation  between  rankings.  This  statistic  is  calculated  from  the  difference  in 
rankings.  The  classification  errors  are  ranked  from  1  to  M  (smallest  to  largest). 
The  divergence  values  are  also  ranked  from  1  to  M  (largest  to  smallest).  In 
both  cases  M  is  the  number  of  combinations  of  the  N  possible  classes  taken 
two  at  a  time.  A  summary  of  the  results  is  presented  in  table  3. 
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TABLE  3.  Classification  Error  Versus  Divergence  Statistics. 


t:  t 

SAMPLING  WINDOW  SIZE 


9 

15 

21 

9 

15 

21 

SCENE 

A 

PANC 

-0.41 

-0.50 

-0.53 

0.39 

0.51 

0.70 

IR 

-0.37 

-0.33 

-0.52 

0.48 

0.45 

0.61 

B 

PANC 

-0.65 

-0.58 

-0.68 

0.71 

0.63 

0.71 

(  ' 

IR 

-0.12 

-0.79 

-  0.63 

0.23 

0.74 

0.64 

PANC' 

-0.62 

-0.73 

-  0.62 

0.72 

0.92 

0.96 

i; 

IR 

-0.69 

-0.77 

-0.81 

0.82 

07 

0.94 

r. 

PANC 

-0.61 

-0.78 

-0.59 

0.58 

0.88 

0.96 

H 

IR 

-0.45 

-0.54 

-0.35 

0.49 

0.88 

0.95 

PANC 

-  0.65 

-0.92 

-0.81 

0.66 

0.94 

0.54 

IR 

-0.50 

-0.46 

-0.59 

0.43 

0.26 

0.26 
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Confusion  Matrices  Before  Relaxation  •  The  required  statistical  entities 
needed  to  evaluate  the  decision  functions  described  in  the  section  on  the  Maxi¬ 
mum  Likelihood  Rule  were  computed  from  the  stored  Max-Min  texture  features. 
The  derived  population  parameter  estimates  were  used  to  classify  the  stored 
Max-Min  texture  features  by  designating  the  most  likely  of  the  R  =  3  most 
likely  classifications  as  the  correct  classification.  The  designated  classifications 
were  compared  to  the  known  classifications  in  order  to  develop  a  matrix  of  hits 
and  misses.  These  results  were  modified  to  produce  the  confusion  matrices  given 
in  appendix  D.  The  10  sets  of  confusion  matrices  are  organized  first  by  scene 
and  then  by  exposure  type.  Each  of  the  10  tables  of  data  is  composed  of  6 
subtables  organized  by  sampling  window  size  and  by  maximum  likelihood  re¬ 
sults  and  by  relaxation  results. 


Confusion  Matrices  After  Relaxation  •  The  relaxation  process  described 
in  the  Relaxation  Labeling  section  was  applied  to  the  derived  vectors  of  most 
likely  classes  and  associated  probabilities.  The  following  values  were  used  to 
run  the  algorithms: 


R  =  3  most  likely  classes 
K.  =  5  iterations 

G  =  25  neighbors  used  in  the  update 


a  (C£,C'7) 


1  if  f  =  7 
0  if  t  /  7 


A  5  x  5  window  centered  over  the  point  to  be  updated  defined  the  G  = 
25  neighbors.  In  this  example,  the  point  to  be  modified  was  used  in  the  update. 
The  vector  of  probabilities  associated  with  the  center  point  was  given  a  weight 
of  W  =  1/3.  Each  of  the  vectors  of  probabilities  associated  with  the  eight  nearest 
neighbors  was  given  a  weight  of  W  =  1/24.  Each  of  the  vectors  of  probabilities 
associated  with  the  16  next  nearest  neighbors  was  given  a  weight  of  W  =  1/48. 
The  weighting  procedure  was  modified  to  handle  the  situation  when  points 
along  the  boundary  of  derived  data  (or  points  adjacent  to  the  boundary)  were 
updated.  The  results  of  the  relaxation  are  presented  in  appendix  D.  The  or¬ 
ganization  of  the  —suits  is  identical  to  those  described  in  the  previous  section. 


DISCUSSION 

The  training  areas  outlined  on  the  scenes  (see  figures  1  to  5)  were  de¬ 
fined  on  the  IR  imagery  by  mathematicians,  not  experienced  photo  interpreters. 
For  example,  heavy  forests  were  distinguished  from  light  forests,  and  these 
from  scrub,  purely  from  appearance.  A  subjective  evaluation  of  the  density  of 
trees  was  the  criterion  used.  The  same  was  true  when  for  Scene  C.  heavy  forest 
(light)  and  heavy  forest  (dark)  were  characterized  and  distinguished  from 
light  forest.  Dark  and  light  pertained  to  tone  in  the  heavy  forest  descriptions. 
It  was  noted  that  the  two  appeared  more  dissimilar  on  the  1R  than  on  the 
PANC,  yet  the  classification  scores  were  similar,  especially  when  Max-Min  texture 
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features  were  extracted  using  21x21  windows.  It  was  also  noted  that  the  dis¬ 
tinction  between  a  light  field  and  a  dark  field  identified  on  the  1R  was  less 
pronounced  on  the  PANC.  Again,  the  classification  performances  for  the  two 
exposure  types  were  comparable,  which  indicates  a  difference  in  Max-Min  texture 
between  light  and  dark  fields.  It  was  noted  after  the  experiment  was  well  under¬ 
way  that  the  training  region  around  the  buildings  and  road  (class  3)  on  Scene 
B  was  too  large.  The  rectangle  included  too  large  an  area  of  light  and  dark  fields. 
The  resultant  poor  classification  of  this  area  in  Scene  B  is  noticeable,  especially 
when  compared  to  results  for  small  window  sizes  from  the  other  scenes. 


The  purpose  of  the  initial  experiment  in  texture  analysis  was  to  evaluate 
the  Max-Min  texture  signature  as  a  means  to  sort  out  broad  areas,  such  as  forests 
and  fields,  and  to  isolate  cultural  detail  from  surrounding  natural  detail.  It  is 
not  expected  that  texture  will  provide  enough  information  to  distinguish  among 
the  various  DLMS  cultural  features.  It  does  appear  from  the  results  to  date 
that  texture  can  isolate  structures  for  subsequent  identification  by  a  photo 
interpreter  on  a  display  device.  This  is  especially  true  if  the  preliminary  results 
from  the  classification  exercise  are  refined  by  relaxation.  In  fact,  the  relaxation 
process  as  shown  in  the  confusion  matrices  presented  in  appendix  D  consider¬ 
ably  reduced  niisclassification  and  false  alarms. 


The  results  to  date  are  not  extensive  over  scene  types,  scale,  or  exposure 
type.  Five  scenes  were  tested  (actually  subsets  of  those  scenes)  and  those  scenes 
were  generally  alike.  Two  exposure  types  (near  infrared  and  panchromatic) 
with  near-identical  scales  (1:70,000)  were  used.  Three  different  window  sizes 
were  evaluated.  The  quality  of  the  classification  results  and  the  quality  of  the 
relaxation  results  noticeably  improved  as  the  window  size  increased.  It  should 
be  noted  that  the  computing  time  lengthens  as  the  window  size  increases  and  that 
the  resolution  of  boundaries  between,  say,  fields  and  forests  will  diminish  as  the 
window  size  increases.  Owing  to  the  lengthy  computer  runs  associated  with  this 
experiment,  only  one  set  of  thresholds  was  tested,  namely  that  given  in  the 
section  on  test  regions. 


Two  statistical  studies  were  performed  during  the  course  of  the  experiment. 
The  first  was  a  compression  study  wherein  the  method  of  principal  components 
was  used  to  explain  the  percentage  of  variation  accounted  for  by  the  first  three, 
by  the  first  five,  and  by  the  first  seven  principal  components.  Results  averaged 
over  scene,  over  exposure  type,  and  over  window  size  show  that  72  percent  of 
the  variation  is  explained  by  the  first  three  components,  85  percent  by  the 
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first  five  components  and  l>2  percent  by  the  first  seven  components.  It  should 
be  noted  that  these  results  pertain  to  individual  subscene  statistics  rather  than  to 
overall  scene  statistics.  Classification  results  were  not  evaluated  using  the  reduced 
texture  signatures  owing  to  the  lack  of  time.  Tests  using  compressed  texture 
vectors  should  be  conducted  on  subsequent  experiments,  since  it  is  impossible 
to  estimate  results  using  the  percentage  of  explained  variation  as  a  predictor. 

An  unsuccessful  attempt  was  made  to  use  the  statistical  measure  of  diver¬ 
gence  as  a  measure  of  effectiveness  for  comparing  one  set  of  texture  parameters 
to  another.  The  correlation  values  R()  (  presented  in  table  3  indicate  a  negative 
correlation  (as  divergence  increases,  classification  error  decreases);  however, 
the  values  do  not  indicate  a  strong  linear  trend.  Several  sets  of  data  were  plotted 
to  determine  if  a  functional  relation  other  than  a  linear  one  existed.  Such  a 
relation  was  not  apparent  from  the  plots.  The  results  were  ranked  as  described 
in  the  section  on  Divergence  Measure  to  determine  whether  there  was  a  consist¬ 
ency  between  the  two  sets  of  results.  Only  those  values  underlined  in  table  3 
refute  (at  the  1  percent  confidence  level)  the  hypothesis  that  the  two  sets  of 
rankings  were  random.  Generally,  there  is  some  support  for  a  relationship  be¬ 
tween  divergence  and  classification  error,  but  not  enough  to  warrant  doing 
away  with  the  classification  tests. 

The  .Max-Min  texture  measure  was  tested  and  found  to  be  an  excellent 
signature  for  scene  classification,  especially  when  relaxation  labeling  processes 
are  used  to  refine  the  classification  results.  The  necessity  of  using  regions  about 
the  point  in  question  to  estimate  texture,  rather  than  estimating  texture  from 
several  pixel  component  values,  as  is  done  in  LANDSAT.  causes  a  loss  of  re¬ 
solution  in  determining  textural  boundaries.  The  loss  in  resolution  becomes  more 
pronounced  as  the  sampling  window  increases  in  size.  The  results  of  the  tests 
to  date  indicate  that  a  9  x  9  window  is  too  small  for  the  problem  at  hand.  The 
smaller  windows,  although  more  precise  for  determining  textural  boundaries, 
tend  to  measure  the  texture  of  the  trees  rather  than  texture  of  the  forest 


The  loss  in  resolution  at  textural  boundaries  caused  by  larger  sampling 
windows  can  be  recovered  by  employing  three  algorithms  in  a  cooperative  mode; 
(It  digital  stereo  compilation.  (2)  edge  detection  (thresholded  gradient  meth¬ 
od).  and  (3)  texture  classification.  In  practice,  stereo  images  exist  that  can  be 
used  to  extract  two  estimates  of  texture  for  each  point.  The  procedure  is  feasible 
if  the  process  is  executed  along  with  the  stereo  compilation  exercise.  In  the  same 
manner,  the  edge  algorithm  can  be  used  to  produce  two  corresponding  sets  of 


edge  data.  The  processes  can  be  integrated  and  controlled  through  known  exterior 
data  relating  the  stereo  imagery  and  by  relational  statements  through  combined 
relaxational  processes.  For  example,  local  slope  information  derived  from  X- 
parallax  data  can  be  used  to  modulate  the  classification  process  and  vice  versa. 


CONCLUSIONS 

1.  Max-Min  is  a  practical  measure  of  image  texture  that  needs  further 
investigation. 

2.  Relaxation  labeling  appears  to  be  a  valuable  method  for  removing 
noise  and  ambiguities  from  derived  classifications. 

3.  Fdge  detection  methods  should  be  used,  along  with  classifications 
derived  from  texture,  to  define  boundaries  between  broad  areas. 

4.  Texture  can  be  used  to  isolate  cultural  detail  from  natural  detail. 

5.  Divergence  does  not  appear  to  be  a  worthwhile  predictor  for  precise 
classification  performance. 

6.  Relaxation  methods  should  be  extended  to  encompass  edge  enhance¬ 
ment.  elevation  refinement,  and  texture  classifications  in  a  cooperative  mode. 


23 


APPENDIX  A. 


Bayes  Classifier  •  Regard  the  classification  process  as  part  of  a  two- 
person,  zero-sum  game  G  =  (U,  V,  L),  where  U  is  an  (N  x  1)  vector  of  strat¬ 
egies  for  the  first  player,  V  is  an  (N  x  1 )  vector  of  strategies  for  the  second 
player,  and  L  is  an  (N  x  N)  loss  function.  If  the  first  player  (Mother  Nature) 
chooses  a  class  Cj  e  U  and  if  the  second  player  (photo  interpreter)  choose  a 
classification  CjC  V,  then  the  second  player  loses  1^.  Suppose  the  first  player 
selects  class  cf  based  on  the  prior  probability  p(cj)  and  produces  an  (M  x  1) 
Max-Min  texture  vector  X.  The  posterior  probability  that  X  belongs  to  class 
Cj  is  p(Cj/X).  If  the  unfortunate  second  player  decides  that  X  comes  from 
Cj,  he  loses  1^.  From  the  second  player’s  point  of  view,  the  pattern  vector  X 
could  have  come  from  any  one  of  the  N  possible  classes.  His  expected  loss  for 
assigning  the  signature  to  the  jlh  class  is 


N 

VX)  =  X  'uP(ci/X)- 

i  =  1 


Suppose  the  second  player  (numerical  classification  process  acting  in 
place  of  the  photo  interpreter)  calculates  r^  (X);  j  =  1,  N  and  assigns  the  pattern 
X  to  the  class  with  the  smallest  t-  (X).  Jf  this  is  done  for  all  X,  then  the  total 
expected  loss  with  respect  to  all  decisions  will  be  minimized.  Such  a  classifier 
is  called  a  Bayes  Classifier. 


The  Bayes  formula  for  posterior  probability  is 


P(Cj/X) 


P(ct)  p(X/c,) 
P(X) 


where 

N 

P(X)  =  ^  P(ck)  p(X/ck). 

k  =  1 
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The  probability  _p(X/c,)  is  the  eonditional  probability  of  X.  given  that 
Cj  has  occurred,  p (q/X)  is  called  the  likelihood  function  of  class  q.  Substitute 
the  likelihood  function  into  the  loss  function  ^(X)  and  get 

r 

N 

r,(X)  =  Y  P<c',)  p(X/c(). 
i  =  t 

Note  that  p(X)  is  a  common  factor  in  rj(X)  for  all  j  and  is  dropped. 
In  general,  the  pattern  X  is  assigned  to  class  q  if  r4(X)  <  rj(X)  lor  j  =  1,  N 
and  j  /  i. 


Suppose  there  is  no  loss  for  a  correct  classification  and  a  unit  loss  for  a 
miselassification,  then  lii  =  0  for  all  i  =  1 .  N  and  lij  =  1  otherwise.  Put  these 
values  into  ^(X)  and  get 

N 

r,(X)  =  Y  p(ci)  p(X^ci)  "  P(cj)  PfX/cp 
i  =  I 

rj(X)  =  p(X)  -  p(Cj)  p(X/Cj). 

This  says  to  assign  the  observation  X  to  class  q  if 

p(X)  -  P(q)  p  (X  /cj)  <  p(X)  -  p(Cj)p(X/Cj) 
for  all  j  =  1 ,  N  and  jXi. 

This  relation  is  written  in  its  final  form  as 


P(Cj)  p(X/c,)  >  p(Cj)  p(X/cj). 
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APPENDIX  B. 


Bayes  Classifier  for  Multivariate  Normal  Signatures  •  from  appendix 
A,  the  Bayes  classifier  for  the  special  case  of  no  loss  for  a  correct  call  and  unit 
loss  for  a  miscall  is 


P(c'j)  p(X/Cj)  >  p(Cj)  p(X/Cj). 


This  relation  says  to  assign  the  texture  vector  X  to  class  Cj  if  the  in¬ 
equality  holds  for  all  j  ?  i.  The  probability  p(ck )  is  the  prior  probability  of  the 
kth  class,  and  the  probability  p(X/ck)  is  the  conditional  probability  of  X, 
given  that  the  klh  class  has  occurred. 


The  classification  problem  lias  been  reduced  to  calculating  N  decision 
functions  df  =  p(Cj)  p(X/e'j)  and  assigning  X  to  the  class  associated  with  the 
largest  d|.  Assume  that  texture  signatures  are  distributed  according  to  the 
Multivariate  Normal  distribution 


p(X/Cj) 


_ [ _  e-‘A(X  -  ,Uj)T  -Mj) 

(2tr)N/2  ISj  1 1/1 


where  and  Ej  are  the  mean  and  covariance  matrix  of  the  distribution. 


The  decision  functions  can  be  simplified  by  taking  natural  logarithms  of 
the  functions  and  using  the  logarithms  as  decision  functions.  This  is  a  valid 
operation  since  the  log  function  is  a  monotonically  increasing  function. 


InUl,) 


=  In  p(Cj)  -  ln2ir  -  'Ain  IZt  I 

-ViiK-n-f  SjUX-Mj) 

The  value  ~  InUrr  does  not  depend  on  i  and  can  be  ignored.  The 
decision  functions  turn  out  to  be 


d;  =  In  p(Cj)  -  Viln  I2j  I-'/HX-m/  Sj1  (X-Mj) 


Note  that  the  set  of  decision  functions,  dj  is  composed  of  constant, 
linear,  and  quadratic  terms  in  X,  which  means  that  the  multivariate  normal 
Baysian  classifier  places,  at  most,  a  second  order  surface  between  pairs  of  texture 
classes. 
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APPENDIX  C 


Divergence  •  Divergence  is  a  measure  of  the  difficulty  of  distinguishing 
between  two  hypotheses  H,  and  il2.7  Note  that  the  hypotheses  can  be  re¬ 
garded  as  “belonging  to  (’lass  1  or  belonging  to  (  lass  2.”  Suppose  the  pattern 
vector  X  can  occur  in  conjunction  with  two  mutually  exclusive  events  c. 
and  c2 .  I’rom  Bayes’  theorem 


P(c,/X) 


p(c,  )  *  p(X/c-)  ) _ 

P(C, )  *  p(  X/c,  )  +  P(c2)  *  p(X/c2  ) 


and 


p(c,/X) 


p(c, )  *  p(X/c, ) _ 

p(c,  )  *  p( X/c ,  )  +  p(c, )  *  P(X/C2 ) 


p(c,/X)  =  p(c, )  *  p< X/c ,  ) 
P(c2/X)  p(c2)  *  p(X/c2) 


Take  the  logarithm  of  both  sides  of  the  equality  and  get 


log 


P(X/c, ) 
p(X/c2) 


p(c,/X) 

log  - 

p(c,/X) 


log 


P(c’i  ) 
P(c2 ) 


Consider  the  right  side  of  this  last  equality. 
p(c.  /X) 

- rr-  is  the  odds  in  tavor  ol  c.  over  c,.  given  that 

p(c,/X) 


X  has  occurred. 


7 


Solomon  ku II back. 


Information  Theory  and  Statistics, 


Dover  Publications.  Inc..  New  York. 


I96N. 
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p(ct ) 
p(c2) 


is  the  odds  in  favor  of  ct  over  c,  before  the  observation  X  is  made. 


Then  the  logarithm  of  the  likelihood  ratio  is  defined  as  the  information  contained 
in  the  observation  X  for  discriminating  in  favor  of  c,  over  c2 .  The  average 
information  for  discriminating  in  favor  of  c,  over  c2  is 


1(1:2) 


p(X/c,  ) 
P<  X/c2 ) 


dX. 


Suppose  the  X  is  from  a  multivariate  normal  distribution  and  that  natural 
logarithms  are  used,  then 


IS,  I  . 

Kl:2)  =  Vi  In  — +  '/Tv,  (v-'  -  2,') 

"  l  1 

+  YiXt  S2'(u,  -u2)  (ft,  -  u, )'  . 

The  details  of  this  derivation  can  be  reviewed  in  C  hapter  l)  of  Kullback.* 
Note  that  T  pertains  to  the  trace  (sum  of  diagonal  elements)  of  a  matrix. 
There  are  several  worthwhile  properties  of  1(1:2)  given  in  the  reference,  l  or 
example,  if  X  and  Y  are  independent  observations,  then  the  amount  of  in¬ 
formation  for  discriminating  in  favor  of  c,  over  c-,  is  1(1:2;X,Y)  =  1(1  2:X)  + 
1(1: 2;Y). 

If  1(1:2)  is  the  amount  of  information  in  the  observation  X  for  discrim¬ 
inating  in  favor  of  C  lass  ^  over  C  lass  c,,  then  by  a  similar  argument.  1(2:1  ) 
is  the  amount  of  information  in  X  lor  discriminating  in  favor  of  c,  over  c,  . 


u 

Solomon  kullback.  Information  Theory  and  Statistics,  Dover  Publications. Inc. .New  York.  1  %K,C  haptcr  9. 


r  _  p(X/c2) 

1(2:1)  -  /  p(X/c2)  log  - - — - —  dX 

J  p(X/c.) 


or 


12:1)  =  -f  P<X/c2) 


log 


p(X/c,  ) 
- —  dX 


ptX/c, ) 


Divergence  is  defined  to  be 

J(l,  2)  =  1(1  :2)  +  1(2:1) 


or 


JO 


r  /  -  -  \  p(X/c,  )  _ 

2)  =  /  ( p(X/C|  )  -  p(X/e2)j  log  — — -  dX. 

J  \  /  p(X/c2 ) 


Divergence  is  a  measure  of  the  difficulty  of  discriminating  between  Class 
c,  and  Class  c2.  Note  that  from  symmetry  J(l.2)  =  J( 2 .  I).  If  the  obser¬ 
vation  X  is  from  a  multivariate  normal  distribution,  then 


J(  1 ,  2)  =  ViTrtS,  -X2)  (£2‘  -  r/  )  +  +ii"')(d]  -u,)(u,  -u,)' 


The  details  of  this  deviation  can  be  reviewed  in  Chapter  0 


of  Kullback.9 


9 


Solomon  Kullback,  hi  forma  non  Theory  and  Statistics.  Dover  Publication, Inc.  New.  York,  I968,(  haplcr  9. 
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The  following  properties  of  J(  1 ,  2 1  are  derived  there: 


•  If  X  and  Y  are  independent  obserations,  then 
J(l,  2;  X,  Y)  =  J(1 , 2;  X)  +  J(1,~2;Y). 

•  J(l,  2)  >  0. 

•  The  above  properties  imply  that 

J(1,2;X,.X2 . XN,XN  t  ,)  >  J(1.2;XrX2 . XN  ) 

The  last  property  says  that  new  observations  can  be  evaluated  for  their  discrim 
inatory  power. 


Large  values  of  J(l,  2)  indicate  more  power  in  discriminating  between 
two  classes.  This  can  be  seen  in  the  following  example.  Suppose  X,  =  Z2  =  £ 
and  that  a  measure  of  the  discriminatory  power  of  an  observation  for  distin¬ 
guishing  between  u,  and  u2  is  desired.  In  this  case, 


J(l,2;u)  =  (u,  -  u2)  £''(u,  -  u2)' 


and  in  the  case  of  univariates. 


J(l,  2:  u)  = 


- 


u2H 


If  the  population  means  are  nearly  equal,  then  there  is  little  discriminatory 
power  in  one  observation.  If  the  means  are  very  different,  then  there  is  more 
discriminatory  power  in  a  single  observation,  and  J(l,  2;  u)  is  also  larger. 


APPENDIX  D. 

Confusion  Matrices  •  The  classification  and  relaxation  results  Iroi  1  the 
numerical  experiment  are  listed  here.  The  (ij)th  entry  ineach  table  refers  to  the 
percentage  of  times  the  i'h  class  was  classified  as  the  j,h  class,  l  or  example, 
consider  the  two  (0  \  6)  matrices  of  table  D1  that  pertain  to  window  si/e  15. 
The  first  matrix  M  pertains  to  initial  results  from  maximum  likelihood,  and  the 
second  matrix  K  pertains  to  relaxation  results.  The  entry  m,,=6.\5  indicates 
that  hi. 5  percent  of  class  3  was  called  correctly,  whereas  m)6  =  10.6  says 
that  10.6  percent  of  class  3  was  called  class  6.  Similarly.  m66  =  60.2  says 
that  60.2  percent  of  class  6  was  called  correctly,  whereas  mb3  =3.7  says  that 
3.7  percent  of  class  6  was  called  class  3.  There  was  a  total  misclassification 
error  between  these  two  classes  of  14.3  percent.  The  modified  results  after 
relaxation  can  be  determined  by  reviewing  the  corresponding  elements  of  the  R 


TABLE  Dl.  Confusion  Matrices  For  Scene  A.  PANC 


TABLE  D2.  Confusion  Matrices  For  Scene 


TABLE  D3.  Confusion  Matrices  For  Scene  B.  PANC 


TABLE  D6.  Confusion  Matrices  For  Scene 
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5 

V) 

UJ 

as 
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TABLE  D7.  Confusion  Matrices  For  Scene  E.  PANC 


TABLE  D9.  Confusion  Matrices  For  Scene  H.  PANC 
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